Skip to content

Conversation

@infil00p
Copy link
Contributor

@infil00p infil00p commented Dec 6, 2025

No description provided.

infil00p and others added 5 commits January 2, 2026 18:42
Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp,
enabling GPU-accelerated inference for vision-language models with platform-specific
execution providers (DirectML, CUDA, CoreML).

Changes:
- Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML)
- Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models
- Extend ModelManager with ONNX model download functionality from HuggingFace
- Add Tauri commands for ONNX model operations (download, load, generate)
- Update UI with ONNX Models tab in model selection modal
- Add quantization selector (Q4, Q8, FP16) for ONNX models
- Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure
  (ONNX files in onnx/ subdirectory, config/tokenizer at root)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Implement logic to detect model backend type and route to appropriate inference engine,
enabling seamless switching between llama.cpp and ONNX Runtime backends.

Changes:
- Update model loading useEffect to check backend type
- Route to load_onnx_model for ONNX models, load_model for llama.cpp
- Disable audio capability check for ONNX models (not yet supported)
- Add backend detection in handleSendMessage for inference routing
- Convert image data appropriately for each backend:
  - RGB array for llama.cpp (existing)
  - JPEG bytes for ONNX Runtime (new)
- Call generate_onnx_response for ONNX, generate_response for llama.cpp

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Update list_downloaded_models to include ONNX models by checking for
_onnx_ pattern in directory names and verifying .onnx files exist.
Add frontend model ID normalization to properly match ONNX models from
HuggingFace repos with normalized names (handle slashes, dashes, case).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp,
enabling GPU-accelerated inference for vision-language models with platform-specific
execution providers (DirectML, CUDA, CoreML).

Changes:
- Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML)
- Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models
- Extend ModelManager with ONNX model download functionality from HuggingFace
- Add Tauri commands for ONNX model operations (download, load, generate)
- Update UI with ONNX Models tab in model selection modal
- Add quantization selector (Q4, Q8, FP16) for ONNX models
- Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure
  (ONNX files in onnx/ subdirectory, config/tokenizer at root)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add SmolVLMImageProcessor with 4x4 grid splitting + global image
  (17 frames total, matching SmolVLM's expected input format)
- Add proper prompt expansion with grid tokens
  (<fake_token_around_image>, <row_X_col_Y>, <global-img>)
- Auto-detect layer count from decoder model inputs
- Support multiple EOS tokens (2, <end_of_utterance>, </s>)
- Fix pixel coordinate order to match model expectations
- Add config.json path to model loading for HuggingFace config
- Add test example for VLM inference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Remove duplicate image = "0.25" from dev-dependencies that conflicted
with image = "0.24" in main dependencies, causing E0464 compilation error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@infil00p infil00p marked this pull request as ready for review January 4, 2026 16:28
@infil00p infil00p merged commit 00dff51 into main Jan 4, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants